Skip to content

fix: evicted transactions could reappear after a node restart#20773

Merged
mrzeszutko merged 2 commits intomerge-train/spartanfrom
mrzeszutko/fix-tx-come-back-to-life
Feb 24, 2026
Merged

fix: evicted transactions could reappear after a node restart#20773
mrzeszutko merged 2 commits intomerge-train/spartanfrom
mrzeszutko/fix-tx-come-back-to-life

Conversation

@mrzeszutko
Copy link
Contributor

Summary

  • Fix a bug in TxPoolV2.addPendingTxs where evicted transactions could reappear after a node restart ("come back to life")
  • Move all throwable I/O (buildTxMetaData, getMinedBlockId, validateMeta) out of the transactionAsync callback into a pre-computation phase, so that if any I/O fails, no in-memory or DB mutations have occurred

Background

addPendingTxs processes a batch of transactions inside a single LMDB transactionAsync. When tx N causes nullifier-conflict evictions of earlier pool txs and then tx N+1 triggers a throw (e.g. getTxEffect I/O failure or validator crash), the LMDB transaction rolls back — but in-memory mutations from the eviction persist. On restart the pool rehydrates from DB where the soft-delete marker was never committed, and the evicted tx loads back into the pool.

Other pool methods (prepareForSlot, handlePrunedBlocks, handleFinalizedBlock, etc.) are not affected because they either perform throwable I/O before any deletions, or wrap throwable operations in try/catch via the EvictionManager.

Approach (TDD)

The fix was developed test-first. Two failing tests were written that reproduce the inconsistency — one for each throwable path (getMinedBlockId and validateMeta) — by verifying that getTxStatus returns the same value before and after a pool restart. With the bug present, the status diverges ('deleted' in memory vs 'pending' after restart). The implementation was then applied to make both tests pass.

Implementation

addPendingTxs is now split into two phases:

  1. Pre-computation (outside the transaction): For each tx, compute buildTxMetaData, getMinedBlockId, and validateMeta. If any of these throw, the call fails before any mutations happen.
  2. Transaction (inside transactionAsync): Uses only pre-computed results, in-memory index reads, and buffered DB writes.

Supporting changes:

  • #addTx accepts an optional precomputedMeta parameter (backward-compatible — other call sites unchanged)
  • #tryAddRegularPendingTx receives pre-computed metadata directly instead of calling buildTxMetaData/validateMeta; validation rejection is handled by the caller

Test plan

  • Two new persistence consistency tests pass (were failing before the fix)
  • All 210 tx_pool_v2.test.ts tests pass
  • Full p2p test suite passes (1292 tests)

Copy link
Collaborator

@PhilWindle PhilWindle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@mrzeszutko mrzeszutko merged commit 0da134d into merge-train/spartan Feb 24, 2026
10 checks passed
@mrzeszutko mrzeszutko deleted the mrzeszutko/fix-tx-come-back-to-life branch February 24, 2026 09:09
AztecBot pushed a commit that referenced this pull request Feb 24, 2026
## Summary

- Fix a bug in `TxPoolV2.addPendingTxs` where evicted transactions could reappear after a node restart ("come back to life")
- Move all throwable I/O (`buildTxMetaData`, `getMinedBlockId`, `validateMeta`) out of the `transactionAsync` callback into a pre-computation phase, so that if any I/O fails, no in-memory or DB mutations have occurred

## Background

`addPendingTxs` processes a batch of transactions inside a single LMDB `transactionAsync`. When tx N causes nullifier-conflict evictions of earlier pool txs and then tx N+1 triggers a throw (e.g. `getTxEffect` I/O failure or validator crash), the LMDB transaction rolls back — but in-memory mutations from the eviction persist. On restart the pool rehydrates from DB where the soft-delete marker was never committed, and the evicted tx loads back into the pool.

Other pool methods (`prepareForSlot`, `handlePrunedBlocks`, `handleFinalizedBlock`, etc.) are not affected because they either perform throwable I/O *before* any deletions, or wrap throwable operations in try/catch via the EvictionManager.

## Approach (TDD)

The fix was developed test-first. Two failing tests were written that reproduce the inconsistency — one for each throwable path (`getMinedBlockId` and `validateMeta`) — by verifying that `getTxStatus` returns the same value before and after a pool restart. With the bug present, the status diverges (`'deleted'` in memory vs `'pending'` after restart). The implementation was then applied to make both tests pass.

## Implementation

`addPendingTxs` is now split into two phases:

1. **Pre-computation** (outside the transaction): For each tx, compute `buildTxMetaData`, `getMinedBlockId`, and `validateMeta`. If any of these throw, the call fails before any mutations happen.
2. **Transaction** (inside `transactionAsync`): Uses only pre-computed results, in-memory index reads, and buffered DB writes.

Supporting changes:
- `#addTx` accepts an optional `precomputedMeta` parameter (backward-compatible — other call sites unchanged)
- `#tryAddRegularPendingTx` receives pre-computed metadata directly instead of calling `buildTxMetaData`/`validateMeta`; validation rejection is handled by the caller

## Test plan

- [x] Two new persistence consistency tests pass (were failing before the fix)
- [x] All 210 `tx_pool_v2.test.ts` tests pass
- [x] Full p2p test suite passes (1292 tests)
@AztecBot
Copy link
Collaborator

✅ Successfully backported to backport-to-v4-staging #20752.

alexghr added a commit that referenced this pull request Feb 25, 2026
BEGIN_COMMIT_OVERRIDE
chore: fix worker wallet in jest (#20725)
fix: pxe native prover log level (#20724)
chore: encode LOG_LEVEL (#20678)
chore: fix worker wallet log level (#20754)
chore: Removed dead code (#20740)
fix: getVotes return empty instead of stale data (#20756)
fix: pass log level to AVM simulator (#20762)
chore: double Node.js libuv thread count (#20709)
fix: underflow in snapshot synch (#20780)
fix: separate rejected and aborted proving jobs (#20777)
fix: evicted transactions could reappear after a node restart (#20773)
fix: (A-575) prevent checkpoint event spam in L2BlockStream on restart
(#20791)
fix: charge 3.6M for epoch verification (#20765)
fix(ethereum): remove viem NonceManager to fix nonce gap after failed
sends (#20819)
fix: limit number of threads when verifying server-side proofs (#20818)
chore: Comment update (#20820)
feat: add support for signed integers on contract functions (#20784)
feat: expose blockheader getters (#20790)
docs: minor clarification (#20788)
fix: Use async poseidon (#20826)
chore: update rollup config for testnet and mainnet (#20827)
END_COMMIT_OVERRIDE

---------

Co-authored-by: alexghr <alexghr@users.noreply.github.com>
Co-authored-by: PhilWindle <PhilWindle@users.noreply.github.com>
Co-authored-by: LHerskind <LHerskind@users.noreply.github.com>
Co-authored-by: ludamad <ludamad@users.noreply.github.com>
Co-authored-by: mrzeszutko <mrzeszutko@users.noreply.github.com>
Co-authored-by: danielntmd <danielntmd@users.noreply.github.com>
Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
Co-authored-by: spypsy <spypsy@users.noreply.github.com>
Co-authored-by: nventuro <nventuro@users.noreply.github.com>
Co-authored-by: aminsammara <aminsammara@users.noreply.github.com>
Co-authored-by: Phil Windle <philip.windle@gmail.com>
github-merge-queue bot pushed a commit that referenced this pull request Feb 26, 2026
BEGIN_COMMIT_OVERRIDE
fix: evicted transactions could reappear after a node restart (#20773)
fix: (A-575) prevent checkpoint event spam in L2BlockStream on restart
(#20791)
chore: New alpha-net environment (#20800)
chore: increase testnet validators (#20809)
chore: Configs for next-net, testnet etc (#20803)
fix(ethereum): remove viem NonceManager to fix nonce gap after failed
sends (#20819)
fix: limit number of threads when verifying server-side proofs (#20818)
chore: Comment update (#20820)
chore: Temper noisy log line (#20821)
feat: read logs from s3, pass on dashboard, health report (#20507)
fix: Use async poseidon (#20826)
chore: update rollup config for testnet and mainnet (#20827)
fix: Reduce tx hash conversions inside tx pool (#20829)
chore: backport skill (#20851)
feat: use native crypto to compute p2p message ID (#20846)
feat(spartan): (A-535) enable admin API key auth for next-scenario
network (#20738)
chore: increase max fee bots use in tests (#20867)
refactor(aztec-node): skip unnecessary subsystems in prover-only mode
and remove P2PClientType enum (#20848)
feat(blob-client): (A-574) replace deprecated blob_sidecars API and
reduce call overhead (#20845)
fix: deflake batch tx req (#20861)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants